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MATRIX REPRESENTATIONS AND INDEPENDENCIES IN 
DIRECTED ACYCLIC GRAPHS 

By Giovanni M. Marchetti^ and Nanny Wermuth^ 
University of Florence and Chalmers/ Goteborgs Universitet 

For a directed acyclic graph, there are two known criteria to 
decide whether any specific conditional independence statement is 
implied for all distributions factorized according to the given graph. 
Both criteria are based on special types of path in graphs. They are 
called separation criteria because independence holds whenever the 
conditioning set is a separating set in a graph theoretical sense. We 
introduce and discuss an alternative approach using binary matrix 
representations of graphs in which zeros indicate independence state- 
ments. A matrix condition is shown to give a new path criterion for 
separation and to be equivalent to each of the previous two path 
criteria. 

1. Introduction. We consider stepwise processes for generating joint dis- 
tributions of random variables Yi for i = 1, . . . ,d, starting with the marginal 
density of 1^, proceeding with the conditional density of l^-i given Y^, up to 
Yi given 12, . . . , 1^. The conditional densities are of arbitrary form but have 
the independence structure defined by an associated directed acyclic graph 
in d nodes, in which node i represents variable Yi. Furthermore, an arrow 
starting at node j > i and pointing to i, the offspring of j, indicates a non- 
vanishing conditional dependence of Yi on Yj. Node j is then called a parent 
node of i, the set of parent nodes is denoted by par^ {i + 1, . . . ,d}, and the 
graph together with the complete ordering of the nodes as V = {1, . . . ,d) is 
the parent graph Gp^j. . 
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The joint density fv{y) of the d x 1 random variable Y factorizes as 

d 

(1-1) /y(y) = n-/'»|par,(yi|yparj, 

i=l 

where /i|par,(?/i|2/parj = fiiVi) whenever par^ is empty. 

For each node j > i, not a parent of i, the factorization (1.1) imphes that 
the ij-arrow is missing in the graph and that Yi is conditionally independent 
of Yj , given ^par^ • The defining list of independencies for G^^^ , written in 
terms of nodes, is 

(1.2) i _IL J Iparj for j > i not a parent of i and i = 1, . . . ,d — 1. 

In general, for any disjoint subsets a, (3 and C of ^ we denote Yq, condi- 
tionally independent of Yg given Yc by a _IL (3\C. The parent graph is a 
special type of independence graph, that is, a graph for which each missing 
edge corresponds to an independence statement. The independence structure 
captured by a graph consists of the list of independences defining the graph 
and of all other independences that derive from this list. 

For instance, the graph G^^^ typically captures more independence state- 
ments than those given directly by the defining list (1.2). We take as an 
example the following graph representing a Markov chain in four nodes: 

1-^2 ^3 ^4. 

The defining independence for node 1 is 1 _IL {3,4}|2, but for instance the 
additional independence statement 1 _IL 4|3 also holds. We say that a distri- 
bution is generated over Gp^^ if its density fv{y) is obtained by the stepwise 
process described above so that it factorizes as in (1.1) and its set of indepen- 
dence constraints is fully captured by Gj^^j, . Thus, fv{y) satisfies precisely 
the independences that derive from (1.2) and no others. 

Methods have been developed to decide for any nonempty sets a and f3 
whether a given parent graph implies that a AL P\C holds for all distributions 
generated over it. Such methods have been called separation criteria because 
they check if the conditioning set C is a separating set, in the sense of 
graph theory. Two quite different but equivalent criteria have been derived. 
Both are based on special types of paths in independence graphs. The first, 
by Geiger, Verma and Pearl [5] has been called d-separation, because it is 
applied to a directed acyclic graph. The second, by Lauritzen et al. [6], uses 
the basic graph theoretic notion of separation in an undirected graph. Such 
a graph, derived from Gp^^, has been named a moral graph. 

In this paper we use a different approach by associating first a joint Gaus- 
sian distribution of Y with G^^^j. . For this distribution, the list of indepen- 
dencies (1.2) is equivalent to a set of zero population least-squares regression 
coefficients, that is, to a set of linear independencies 

i _IL j| parj Alj.par, = 0, 
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where pa,r^ is an adaption of the Yule-Cochran notation for the regression 
coefficient of Yj , here in hnear least-squares regression of Yi on Ypar . and Yj 
for j > i not a parent of i. 

There are then two key results: (a) in distributions of arbitrary form 
generated over Gp^^j., probabilistic independence statements combine in the 
same way as linear independence statements; and (b) special types of path 
in Gpj^j. lead to dependence of Y^ on Yp, given Yc in the relevant family 
of Gaussian distributions generated over a given parent graph, that is, in 
Gaussian distributions with parameters constrained only by the defining list 
of independencies (1.2) and having nonvanishing dependence of Yi on Yj, 
given 5^ar\j foi' every i ^— j arrow present in the parent graph. 

In any Gaussian distribution of y, a AL P\C holds if and only if the pop- 
ulation coefficient matrix of in linear least-squares regression of Yq. on 
both Yjs and Yc is zero. This matrix is related to linear equations associ- 
ated with G^ar using a generalization of the sweep operator for symmetric 
matrices [4] called partial inversion. 

With another operator for binary matrices, named partial closure, so- 
called structural zeros in this matrix are expressed in terms of a special 
binary matrix representation derived from the parent graph. A particular 
zero submatrix will be shown to imply that a _IL /3|G holds in all distributions 
generated over Gp^j.- 

This matrix criterion leads to a further path-based criterion for separa- 
tion in directed acyclic graphs. Finally, equivalence of the new criterion to 
each of the two known separation criteria is established after having given 
first equivalent matrix formulations to each of these latter two path-based 
criteria. 



2. Edge matrices and induced independence statements. Every indepen- 
dence graph has a matrix representation called its edge matrix (see [10, 12]). 
In this paper we are concerned with edge matrices derived from that of the 
parent graph called induced edge matrices. 

2.1. Edge matrix of the parent graph and linear recursive regressions. 
The edge matrix of a parent graph is a d x d upper triangular binary matrix 
A = {Aij ) such that 



(2 1) Ai ={ ^""^ "'^^•^ °^ ^ "^-^ ™ ^P'^'' °^ ^ ^ 

\ 0, otherwise. 

This matrix is the transpose of the usual adjacency matrix representation 
of Gpar, with additional ones along the diagonal. Because of the triangular 
form of the edge matrix A, densities (1.1) generated over Gjf^r are called 
triangular systems. 
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If the mean-centred random vector Y generated over the parent graph has 
a joint Gaussian distribution, then each factor /i|par, (2/«|yparJ of equation 

(1.1) is a hnear least-squares regression 

where the residuals Ei are mutually independent with zero mean and variance 
0"M|pari- Then, the joint model can be written in the form 

(2.2) AY = e, 

where j4 is a real- valued d x d upper triangular matrix with ones along the 
diagonal. The dxl vector e has zero mean and cov(e) = A, a diagonal matrix 
with elements An = Cjiipar. > 0. The covariance and concentration matrix 
of Y are then, respectively, S = A~^AA~'^ and = A^A~^A. An upper 
off-diagonal element of the generating matrix A is Aij = — Aij.par \j 7^ if 
the ij-arrow is present in the parent graph and Aij = if the ij-arrow is 
missing in Gp^^. 

In this Gaussian model, there is a one-to-one correspondence between 
missing ij-edges in the parent graph and zero parameters Aij = 0. As a 
consequence, any such zero coincides in this case with a structural zero, 
that is a zero that holds for the relevant family of Gaussian distributions 
generated over Gp^,.. Therefore, the edge matrix A of the parent graph can 
be interpreted as the indicator matrix of zeros in A, that is „4 = In[^], where 
the In[-] operator transforms every nonzero entry of a matrix to be equal to 
one. 

The edge matrix A of Gp^^ is the starting point to find induced conditional 
independencies satisfied by all distributions generated over the parent graph. 
As we shall see, for any given Gaussian distribution generated over Cj^j^j,, 
independence statements are reflected by zeros in a matrix derived from 
A. Some of these zeros may be due to specific parametric constellations, 
others are consequences of the defining list of independencies (1.2). These 
latter zeros, that is the structural zeros in this matrix, show as zeros in 
edge matrices derived from A, which in turn are matrix representations of 
induced independence graphs. 

2.2. Independence and structural zeros. For any partitioning of the ver- 
tex set V into node sets M, a, (3, C, where only M and C may be empty, a 
joint density that factorizes according to (1.1) is now considered in the form 

fv = fAI\al3cfa\/3cfl3\cfc- 

Marginalizing over M, as well as conditioning on C and removing the cor- 
responding variables, gives faf3\c = fa\f3cfi3\c^ so that a iL /?|C holds if and 
only if fo,\i3c = fa\c- 
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For a Gaussian distribution, the independence a _IL (3\C is equivalently 
captured by 



where Ha\(3.c denotes the coefficient matrix of Yg in hnear least-squares 
regression of on Yp and Yc, and ^af3\c the joint conditional covariance 
matrix of Y^, and given Yc (see Appendix A. 2). The essential point is then 
that independence of Ya and Yjs, given Yc, is implied for all distributions 
generated over Gp^j., if and only if for Gaussian models (2.2) the induced 
coefficient matrix H^ifs.c is implied to vanish for all permissible values of the 
parameters. 

With a = a\M and f3 = b\C, we have {Ila\b)a,i3 = n„|/3.c- Therefore, the 
specific form of Iia\f3.c implied by A in equation (2.2) can be derived by 
linear least-squares regression of on and the independence structure 
implied for it by A in equation (2.1) in a related way. Before turning to 
these tasks, we summarize how linear independence statements relate to 
probabilistic independencies specified with a parent graph. 

2.3. Combining independencies in triangular systems of densities. It has 
been noted by Smith ([8], Example 2.8) that probabilistic and linear inde- 
pendencies combine in the same way. We prove a similar property, using 
assertions that have been discussed recently by Studeny [9], where we take 
V to be partitioned into a, b, c, d. 

Lemma 1. In densities of arbitrary form generated over Gp^r? condi- 
tional independence statements combine as in a nondegenerate Gaussian 
distribution. This means that they satisfy the following statements, where 
we write, for instance, be for the union of b and c: 

(i) symmetry: a _IL b\c implies b _LL a\c; 

(ii) decomposition: a _IL bc\d implies a _LL b\d; 

(iii) weak union: a _LL bc\d implies a _LL b\cd; 

(iv) contraction: a _IL b\c and a Jl d\bc imply a _IL bd\c; 

(v) intersection: a _LL b\cd and a _LL c\bd imply a JL bc\d; 

(vi) composition: a _LL c\d and b _LL c\d imply ab JL c\d. 

Proof. The first four statements are basic properties of probabilities 
(see, e.g. [2]). Densities (1.1) generated over a parent graph also satisfy 
properties (v) and (vi), due to the full ordering of the node set the ij- 
dependence if and only if there is an ij-arrow in G^^^ and the lack of any 
other constraint on the density. 

For (v), the generating process implies for two nodes i < j that of a JL i\jd 
and a ^L j\id, the statement a ^L j\id is not in the defining list (1.2) unless 




6 



G. M. MARCHETTI AND N. WERMUTH 



there are additional independencies. If a AL j\id is to be satisfied, then at 
least a -lLj\d has to be in the defining list as well. And, in this case, fija\d = 
fi\jdfj\d = fij\d, so that a _IL ij\d is implied. 

For (vi) and again i < j, both of i _IL c\d and j _LL c\d can only be in the 
defining list of independencies if the statement i _LL c\jd is also satisfied. And, 
in this case, fijc\d = fi\jdfj\d = fij\dj so that ij _LL c\d is implied. Equivalence 
of each of the assertions to statements involving least-squares coefficient 
matrices, proved in Lemma A.l, Appendix A. 2, completes the proof. □ 

2.4. Partially inverted concentration matrices. The induced parameter 
matrix Ila\(3.c is to be expressed in terms of the original parametrization 
{A, A). This is achieved in terms of the matrix operator partial inversion 
(see Appendix A.l for a detailed summary of some of its properties). 

Partial inversion with respect to any rows and columns a of the concen- 
tration matrix, partitioned as (a, 6), transforms S^"*^ into inv^S^^ 

S^bb.a 



(2.3) S-^=(^ _ inVaS 

where the • notation indicates entries in a symmetric matrix, and the ~ 
notation denotes entries in a matrix that is symmetric except for the sign. 
The submatrix ILa\b is as defined before; submatrix S^^i^ = (T,'^°-)~^ is the 
covariance matrix of 1^ — 11^1^1^;, and submatrix S^''-'^ = is the marginal 
concentration matrix of Yf,. We denote by A the accordingly partitioned 
matrix A in which the original order is preserved both within a and within 
b, but which is typically asymmetric and not triangular. 
An important property of partial inversion is that 

so that, with B = inv^ A, one obtains directly the equations in Yj from which 
Ya has been removed as 

(2.4) BbbYb = r]b, r]b = eb- BbaSa- 
In addition, direct covariance computations give 



en 



Ann AnnBr,„ 



• Abb + BbaA^aBl 

Therefore, equations in Ya corrected for linear dependence on Yb and having 
residuals uncorelated with rjb are, with H = invfor, 

(2.5) Ya-Ua\bYb = BaaVa, = - Habllb- 

Lemma 2 below is now a direct consequence of the definition of H and 
equations (2.4) and (2.5). It leads, after expansion of the matrix i7, to an 
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explicit expression of the matrix of least-squares regression coefficients 11^16 
as a function of A and B, used later for Proposition 2. Similarly, the explicit 
expression of will be used for Proposition 3 and of S^^ " for Proposition 
4. 



Lemma 2 (Wermuth and Cox [10]). For a linear triangular system (2.2), 
with S = cov(y), a any subset ofV, b = V\a and H as defined for equations 
(24), (2.5), 



(2.6) 



Bab + BaaHabBbb 



BbbHhbBhb 



2.5. Induced edge matrices. To obtain induced edge matrices, we display 
first explicitly B = inva A, and B = zer^ A, obtained by what is called the 
operator of partial closure (see Appendix A.l for some of its properties). It 
finds both the structural zeros in B and the edge matrix induced by A for 
what we define below as the partial ancestor graph, with respect to subset 
a of node set V. 



B 



(2.7) 



with 



^ aa 
^baAaa 



A, 



bb 



'Afla^ab 

' -^baAaa -^ab 



B = ln 



A~ 

AhaA^ 



AaaAab 
Ahb ~\~ AhaA^i^Aab 



A: 



ln[{klaa - Aaay% 



where laa denotes an identity matrix of dimension da and k = da + 1- The 
matrix A'^ provides the structural zeros in A~^ and the edge matrix of 
the transitive closure of the graph with edge matrix Aaa, for which fast 
algorithms are also available (see [3]). 

The transition from a matrix of parameters in a linear system, such as B 
in equation (2.7), to a corresponding induced edge matrix, B, is generalized 
with Lemma 3 below. 



Lemma 3 (Wermuth and Cox [11]). Let induced parameter matrices be 
defined by parameter components of a linear system of the type FY = with 
possibly correlated residuals C, so that the defining matrix products hide no 
self- cancellation of an operation such as a matrix multiplied by its inverse. 
Further let the structural zeros of F be given by T . Then, the induced edge 
matrix components are obtained by replacing, in a given sum of products: 

(i) every inverse matrix, say F~^ by the binary matrix of its structural 
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(ii) every diagonal matrix by an identity matrix of the same dimension; 

(iii) every other suhmatrix, say —Fab or Fab, by the corresponding binary 
submatrix of structural zeros, Tabi 

and then applying the indicator function. 

By using Lemma 3, each submatrix of a linear parameter matrix is sub- 
stituted by a nonnegative matrix having the appropriate structural zeros. 
By multiplying, summing and applying the indicator function, all structural 
zeros are preserved and no additional zeros are generated, while some struc- 
tural zeros present in T may be changed to ones. 

After applying Lemma 3 to equation (2.6), the edge matrix components 
induced by A for inVaS"^ result. These components are Va\b of ^a\b-> Saa\b 
of ^aa\b and S^^-" of YP^-''. 

Lemma 4 (Wermuth, Wiedenbeck and Cox [12]). The edge matrix com- 
ponents induced by a parent graph for inVaS"-*^ in (2.3) are 

(2.8) ('^^'^1^ |lt)=In 
where 

Lemma 4 leads to the following matrix criteria for independencies implied 
by a parent graph, where V is partitioned as before into M, a, /3, C. 

Proposition 1. The parent graph G^^-^ implies, for every density gen- 
erated over it that: 

(i) a _IL (j\C holds if and only if {Va\b)a,f3 = Vai/B.c = 0; 

(ii) a _IL M\b holds if and only if SaM\b = 0/ 
(hi) pMC holds if and only if Sl^^-" = 0. 

Proof. For (i), if there is a one for i,j in Va\(}.c-> then (3i\j,ci3\j 
holds in the relevant family of Gaussian densities generated over G^^j. (see, 
e.g. [11]). If, instead, the ij-edge is missing, then iiLj|C/3 is implied for 
every member of the family of Gaussian distributions. For Va\i3,c = 0, the 
statement a AL I3\C results with Lemma 1 for every distribution generated 
over Gpaj.. For (ii) and (iii), the arguments are analogous. □ 



Saa'Haa^aa ^ab + ^aoHab^i 



bb 



BlUbbB, 



bb 
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3. A path-based interpretation of the matrix criterion. To give a path 
interpretation of the matrix criterion 'Pa\p.c = 0) we first summarize some 
definitions related to paths and graphs. 

Two nodes i and j in an independence graph have at most one edge. If 
the ij-edge is present in the graph, then the node pair i,j is coupled; if the 
zj-edge is missing, the node pair is uncoupled. An ij-path connects the path 
endpoints i and j by a sequence of edges visiting distinct nodes. All nodes of 
a path except for the endpoint nodes are called the inner nodes of the path. 
An edge is regarded as a path without inner nodes. For a graph in node set 
V and a CV, the subgraph induced by a is obtained by removing all nodes 
and edges outside a. 

Both a graph and a path are called directed if all its edges are arrows. 
Directed graphs can have the following V-configurations, that is subgraphs 
induced by three nodes and having two edges, 

i^—t^j, i j, i— ^ c ^j, 

where the inner node is called a transition (t), a source (s) and a collision 
node (c), respectively. A directed path is direction-preserving if all its inner 
nodes are transition nodes. If in a direction-preserving path an arrow starts 
at node j and points to i, then node j is an ancestor of i, node i a descendant 
of j, and the zj'-path is called a descendant-ancestor path. 

Node j is an a-line ancestor of node i if all inner nodes in the descendant- 
ancestor ij-path are in set a. A directed path is an alternating path if it has 
at least one inner node and the direction of the arrows changes at each 
inner node. This implies that no inner node is a transition node and that 
the inner nodes alternate between source and collision nodes. A parent graph 
is said to be transitive if it contains no transition-oriented V-configuration 
or, equivalently, if for each node i the set par^ of its parents coincides with 
its set of ancestors. 

The partial ancestor graph, with respect to nodes a, denoted by G^'J,, 
is an induced graph defined, for a reordered node set, by the edge matrix 
B = zeVaA of equation (2.7). The elements of B are equivalently given by 

_ ( I, if and only if j is an a-line ancestor of i in G^^j. or i = j, 
'■^ \ 0, otherwise, 

(3.1) 

with nodes ordered as for equation (2.7). Since B in (3.1) implies that every 
a-line descendant-ancestor path in G^^r is in G^^, closed by an arrow that 
points to the descendant, the corresponding operator has been named partial 
closure. Induced edge matrices and induced linear parameter matrices may 
be calculated within the statistical environment R (see [7]). 

Figure 1(a) shows a parent graph in six nodes. Figure 1(b) its partial 
ancestor graph with respect to a = {1,2,3}, and Figure 1(c) the induced 
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6 6 a b 




Fig. 2. An alternating path in G^nc from b to a; active since, of its inner nodes, each 
source node is in a and each collision node in b — V \ a; nodes in a are indicated by ^ , 
those in b by 



graph for the conditional dependence of Ya given Yf,. In this example, {a,b) 
is an order compatible partition of the node set V, that is a mere split of 
V = {1, . . . ,d) into two components without any reordering. For such a split, 
Aba = I3ba = implies Va\b = ^ab- With a transitive parent graph, in addition, 
Va\b = Aab for all order compatible partitionings of V. 

If, instead, a is an arbitrary subset of V, then the graph with edge matrix 
Va\b may contain additional edges compared to the graph with edge matrix 
Bab [see equation (2.8)]. Such edges are due to the following type of path. 

Definition 3.1. An zj-alternating path in the partial ancestor graph 
is called active if of its inner nodes every collision node is in b and 
every source node is in a. 

Thus, every off-diagonal one in the edge matrix (2.8) induced by a parent 
graph for inv^ can be identified in the partial ancestor graph by what 
we call its active paths. In diagrams of paths, we indicate nodes within a as 
crossed out, ^, and nodes within b as boxed in, |o|, such as in Figure 2. 

Definition 3.2. An zj-path in the partial ancestor graph is active 
if it is an ij -edge or it is an active alternating path. 
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Proposition 2. For node set V , partitioned into a and b and having 
node i in a and node j in b, the induced graph with edge matrix Va\b has 
an ij -arrow if and only if there is an active ij-path in the partial ancestor 
graph, with respect to a. 

The essence of the proof is the expansion of the sum of products defining 
Va\b in equation (2.8) into submatrices oi B = zeiaA and the interpretation 
of each matrix operation in terms of arrows present in G^". 

Proof of Proposition 2. From equation (2.8) defining V^lb, there is 
an ij-one in Vaib if and only if 

I3,j = l or B,al3Z{^bb + l3bal3Z)~Sbj>l. 

The first condition Bij = 1 holds if there is an arrow pointing from j in b 
to i in a. The second condition holds if either an arrow points from a to 
b, or if i and j are uncoupled but connected by an active alternating path. 
This interpretation of the second condition as an active alternating path is 
illustrated with the following scheme, in which the inner nodes are shown 
by their location in either a or b: 

Edge matrix Bia B^^ {Ibb + BbaBj^)" Bbj 
Path "^'^ \ a • • * \ a ' 

Such an ij-path induces a dependence of Yi on Yj given Yi,\j in the relevant 
family of Gaussian distributions generated over Gp^^j.. □ 

The scheme shows that for the second condition, there has to be at least 
one arrow in pointing from a to 6. Therefore, for any order compatible 
split of V into (a, 6) there is no active alternating path in G^^. 

Proposition 2 leads to the following path criterion. 

Criterion 1. // there is no active path between a and (3 in the partial 
ancestor graph G^^^, then a _LL (3\C in every joint density generated over the 
given parent graph. 

Figure 3(a) shows a parent graph and illustrates the use of this criterion. 

4. Equivalence to known separation criteria. For a discussion of the cri- 
teria available in the literature to verify whether a AL P\G is implied by a 
given parent graph G^^^, we take throughout the node set V to be parti- 
tioned into M, a, P, C, where only M and C may be empty, and every node 
pair between a and /? to be uncoupled in Gpg^j.. 
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Fig. 3. Illustration of Criterion 1. (a) Parent graph, (b)-(d) Is a ^L j3\C implied for 
a = {5}, P = {7} and different choices of C; {□} = /3 U C and {^} = aU M. (b) 
5_1L7|{3,4} not implied, since the alternating path (5,3,6,4,7) m CYnc is active with its 
inner source node, 6, in {(9*} "'^'^ inner collision nodes, 3,4 in {|Q|}; (c) 5 _LL 7|{3,4,6} 
implied, since inner source node 6 in {|Q|}; (d) 5_LL7|{2,3} not implied, since the alter- 
nating path (5, 3, 6, 2, 7) in CYni is active. 

One known criterion uses definitions of a d-connecting path and of d- 
separation, where the letter d is to remind us that the definitions pertain to 
a directed acyclic graph. In Gp^^j., a path is said to be d-connecting relative 
to C if along it every inner collision node is in C or has a descendant in C 
and every other inner node is outside C. And, two disjoint sets of nodes a 
and (3 are said to be d-separated by C if and only if, relative to C, there is 
no d-connecting path between a and (3 that is between a node in a and a 
node in j3. 

Criterion 2 (Geiger, Verma and Pearl [5]). If a and [5 are d-separated 
by C in the parent graph, then a _IL /3|C in every joint density generated over 
the given parent graph. 

Proposition 3 below gives a matrix criterion that we will show to be 
equivalent to d-separation. For this, we denote by g = V \ C the union of 
a, P and M, and by ^ = zer^ A the edge matrix of G?^^, the ancestor graph 
with respect to g. By equating a to g and b to C, equation (2.8) gives 

■Sgg\c = M^ggi^gg + ^Cg^Cg)~ ^]g] 

as the edge matrix induced by A for Tigg\(j. Edges in the corresponding 
undirected graph, named the induced covariance graph of Yg given Yc [10], 
are drawn later in Figure 5 as dashed lines. 

Proposition 3. In the parent graph G^^j. , sets a and (3 are d-separated 
by C if and only if S^p\c — 0; where is the edge matrix induced by A 

for Sa/3|c- 
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The following scheme shows that Sap\c = means the absence of any 
active path in G^:,| from a to /?, both subsets of g: 

Edge matrix T^g {Igg + T'^gTcg)' T'Jg 
Path "^-^^ a ^^ya ... 9 ^^ya a^P ' 

Proof of Proposition 3. By the definition of partial inversion and 
of Sgg\c^ ^ d-connecting ij-path in Gp^^j. relative to C and without inner 
collision nodes, forms in G^nc ij-edge or a sequence of three nodes {i, s,j), 
with s a source in g. Both types are active paths in G^;^. 

If there is a d-connecting zj-path in G^^j. , relative to C and having an 

inner collision node, then zeVgA generates an active alternating ij-path in 
as follows. Every inner source node and every inner collision node 
within C is preserved. Every inner collision h outside G is replaced by its 
first 5f-line descendant he within G. Every transition node t in an inner node 
sequence {i,t,j) is removed via the ij-edge present in G^^^. 

Conversely, if there is an active alternating i j-path in G^^ , then, by these 
constructions, there is a d-connecting path relative to G in G^^^^ . □ 

Figure 4 shows a d-connecting path relative to conditioning set G = {|0]} 
and the corresponding active alternating path in G^^ with g = {p}- 

The other criterion in the literature uses an undirected graph called the 
moral graph of and G. This moral graph is constructed in three steps. 
One obtains the subgraph induced by the union of the nodes a,/9, and G 
and their ancestors. One joins by a line every uncoupled pair of parents 
having a common offspring. One replaces every arrow in the resulting graph 
by a line. Then, the separation criterion for undirected graphs is used to 
give Criterion 3. In the moral graph, G separates a from (3 if every path 
from a node in a to one in (3 has a node in G. 

Criterion 3 (Lauritzen, Dawid, Larsen and Leimer [6]). If a and j3 are 

separated by G in the moral graph of a,f5,C, then a AL I3\G in every joint 
density generated over the given parent graph. 



59876 596 




(a) ,i m 

Fig. 4. (a) Example of a d-connecting ij-path in G^ar relative to C with inner nodes 
5,4,9,8,7,3,6 and (b) the corresponding active alternating ij-path in G^n? with inner 
nodes 5, 4, 9, 2, 6. 
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By the definition of the moral graph and by equating b to Q, the union 
of a,P and C and their ancestors, and a to O = V\ Q, equation (2.8) gives 
the edge matrix of the moral graph of a, (3, C as S^'^'^ the edge matrix 
induced by A for Yf^^'^ . Since there is no path leading from O to Q, this 
edge matrix has the special form 

(4.1) 5««-0 = In[^T^^QQ] due to Aoq = ^. 

Thus, the induced graph contains an ij-edge if and only if in the parent 
graph either there is an ij-edge, or AuAhj = 1 for some node h < i < j in 
Q, that is for nodes i and j having a common offspring in Q. 

Proposition 4 below gives a matrix criterion that we will show to be 
equivalent to separation in the moral graph of a,/3 and C. For this, we let 
q = V\ M, so that the set q denotes the union of a,/? and C. Furthermore, 
we denote hy Z = zer m A the edge matrix of the induced partial ancestor 
graph with respect to M. Then, by equating 6 to g and a to M, equation 
(2.8) gives, as edge matrix induced by A for S'''^'^^, 

(4.2) = HZl{I,g + Z,rZlrZ,,\, 

where r denotes the set of ancestors of q within M . Again, the special form 
of 59g.A/ is due to Aqq = 0. It leads to Zqq = (zer^ A)q,q since M = rUO. 
As a consequence, also S'^'^'^ = S'^'^'^. Edges in this type of undirected graph, 
named the induced concentration graph of Yg, are drawn in Figure 5 as full 
lines. 

Proposition 4. In the moral graph of a,f3 and C, set a is separated 
from (3 by C if and only if = Q, where 5"^'^^ is the edge matrix 

induced by A for S"/^-^^. 



Mua puC Muaup ctu^uC 




(a) (b) (0) 



Fig. 5. Graphs induced by the parent graph in Figure -/(a), each of which shows that 
a _1L /3|C holds for a = {2}, /3 = {3,4} and C = {6} by all edges between ot and (3 being 
missing. The graph induced by A (a) with edge matrix Va\b for a = V \ b and b the union 
of /3 and C ; (b) with edge matrix Sgg\c for g = V \ C ; (c) vnth edge matrix S'^'^'''' for 
q^V\M. 
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The following scheme shows that = means the absence of any 

active path in G^nc from a to (3, which are both subsets of q: 

Edge matrix Zj^ {I^q + Zgr2j^.)~ Zg/s 
Path <? \ ^ . . . 9 \ ^ /<? 9^/^ ■ 

Proof of Proposition 4. We show that the edge matrix 599-*^ can 
equivalently be obtained via edges and alternating paths in G^^^ and by 
closing all r-line paths in the moral graph for a,(3,C, which has the edge 
matrix S^^'^ in equation (4.1). 

For this, we note first that Q = qyjr and we recall that, for S^^'^, all 
collision-oriented V-configurations in the parent graph are closed, that have 
a common collision node in Q. Then, in the resulting concentration graph, all 
r-line paths are closed by partial inversion of S^^ '^, with respect to r. This 
gives, for the subgraph induced by nodes q, the edge matrix {zeTr<S'^'^'^)q^q. 

For the edge matrix S'^'^'^^ in (4.2), all r-line ancestor-descendant paths 
are closed first with Z = zer^ whereby every collision node within r, each 
of which has a g-line descendant in g, is replaced by the first descendant in q 
(see Figure 4 for an illustration). Then, the active alternating path in G^^^^ 
has every source node in r and every collision node in q. 

Thus, {zeTrS'^^''^)q^q = S'^'^'^'^ , siucc for both edge matrices exactly the 
following types of V-configurations are closed in the subgraph induced by q 

ill G'par^ 

i ^— r ^— j, i r — ^ j and i ^— Q — ^ j. □ 

Our final result establishes the equivalence of the three path-based sepa- 
ration criteria for V partitioned as before into M, a,f3,G and explains why 
proofs of equivalence become complex when they are based exclusively on 
paths that induce edges in different types of graph. 

Proposition 5. The following assertions are equivalent. Between a and 
P there is: 

(i) an active path in the partial ancestor graph with respect to M and 

a; 

(ii) a d-connecting path relative to G in the parent graph; 

(iii) a M-line path in the moral graph of a,/3 and C. 

Proof. By using Propositions 2 to 4, the results follows after noting 
that 

Vo.\p.c = ^ S^p\c = ^ 5"^-^' = 0. □ 
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Figure 5 illustrates the result of Proposition 5 for the parent graph of 
Figure 1(a), with a = {2}, 13 = {3,4}, C = {6} and M = {1,5}. The inde- 
pendence statement 2 _IL {3,4}|6 is implied by the parent graph, since the 
subgraph induced by nodes a and /3 has no edge between a and j3 in the 
induced graph with edge matrix Va\h ™- Figure 5(a), with edge matrix Sgg\(j 
for g = V\ C in Figure 5(b), and with edge matrix S'^^'^ for q = V \ M in 
Figure 5(c). 

APPENDIX: OPERATORS AND LINEAR INDEPENDENCIES 

A.l. Partial inversion and partial closure. Two matrix operators, intro- 
duced and studied by Wermuth, Wiedenbeck and Cox [12], permit stepwise 
transformations of parameters in linear systems and edge matrices of inde- 
pendence graphs, respectively. Note that M now denotes a matrix. 

Let M be a square matrix of dimension d, for which all principal sub- 
matrices are invertible. For a given integer 1 < A; < d, partial inversion of 
M with respect to /c, transforms M into a matrix = inv^M of the same 
dimensions, where, for all i,j^k, 

Nkk = l/Mkk, 
Nik = Mik/Mkk, 

(A.l) 

Nkj = -Mkj/Mkk, 

Nij = Mij- MikMkj /Mkk. 

Then the matrix TV of structural zeros in = inv^ M that remain after 
partial inversion of M on k is defined by TV = zer^ TW : 



(A.2) 



Partial inversion of M, with respect to a sequence of indices a, applies the 
operator (A.l) in sequence to all elements of a and is denoted by inVaM. 
Similarly, partial closure (A.2) of A4, with respect to a, is denoted by zer^ M 
and closes all a-line paths in the graph with edge matrix TM [see (3.1)]. 

Partial inversion of M, with respect to all indices V = {l,...,d}, gives 
the inverse of M and partial closure of M, with respect to V, gives the edge 
matrix TV4~ of the transitive closure of the graph with edge matrix Ai 

invy M = M"\ zervM = M~ 



Mkk 


= 1, 






= Mik, 




Mkj 


= Mkj, 




Mi, 




if Mij = 1 or MikMkj = 1 
otherwise. 
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[see also equation (2.7)]. 

Both operators are commutative; that is, for three disjoint subsets a, b 
and c of V 

inva invfeM = inv;, inVaM, zer^ zer^A^ = zer;, zer^ , 

but partial inversion can be undone while partial closure cannot 

invab inv bcM = invacM, zer ab zer bcM = zei abcM . 

For V = {a,b}, M partially inverted on a coincides with partially 
inverted on b, 

(A. 3) inVaM = invfcM"^ 

A.2. Zero partial regression coefficients and independence. A Gaussian 
random vector Y has a nondegenerate distribution if its covariance matrix S 
is positive definite. From S partitioned as {a,b,c,d), the conditional covari- 
ance matrix T,ab\c, of Ya and Yf, given Yc, is obtained by partially inverting 
S, with respect to c, 

'^ab\c — '^ab ~ ^ac^cc^cb- 

The least-squares linear predictor of Ya given Y^ is n^i^Yh with n^i^ = 
T,ab^hb ■ For prediction of Ya with Yb and Yc, the matrix of the least-squares 
regression coefficients Ila\bc is partitioned as 

(A. 4) Ua\bc = ( n^lfe.c '^alc.b ) = ( ^ab\c^bb\c ^ac\b^cc\b ) ' 

where, for example, Ila\b.c^b predicts Ya with Yj, when both Ya and Yb are 
adjusted for linear dependence on Yc- Equation (A. 4) is generalized by 

(A. 5) Ila\bc.d = {^a\b.cd ^a\c.bd) = ab\cd^ bb\cd ^a.c\bd^cc\bd) ■ 

By property (A. 3) of partial inversion 

{^a\b.cd n^icM ) = — (S"") ^(S"^ S""^), 

where T.""", S'^'' and T."" are submatrices of the concentration matrix S . 
From the concentration matrix S^'^'", of Yb and Yc after marginalizing 
over Ya is obtained by partially inverting with respect to a, 

A recursive relation for matrices of least-squares regression coefficients gen- 
eralizes a result due to Cochran [1], 

^a\b.cd = ^a\b.c ~ ^a\d.bJ^d\b.c^ 

and is obtained with partial inversion of S first, with respect to b, c and d. 
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Lemma A.l. For nondegenerate Gaussian distributions, linear and prob- 
abilistic independencies combine equivalently as follows: 

(i) symmetry: Ila\b.c = implies n^i^j ^ = <^=> a _LL b\c implies b _H a\c; 

(ii) decomposition: Ila\bc.d = implies Ila\b.d = 0<^=^ a _IL bc\d implies 
a iL b\d; 

(iii) weak union: Ila\bc.d = implies 11^1;,, cd = 0<^=^ a IL bc\d implies a _IL b\cd; 

(iv) contraction: Ila\b.c = and Ila\d.bc = imply Ila\bd.c = <^=^ a _IL b\c 
and a _IL d\bc imply a _IL 6(i|c; 

(v) intersection: n^ib.cd = and U^ic.bd = imp/l/ ^a\bc.d = <^ a 
6|cd and a _IL c\bd imply a _LL 6c|(i; 

(vi) composition: Ila\c.d = ^^^^^ n^i^^ = imply Ilab\c.d = <^=^ a _IL c\d 
and b _U_ c\d imply ab _LL c|(i. 

Proof. Definition (A. 4) implies that n^i^ ^ vanishes if and only if T,ab\c = 
0, so that (i) results by the symmetry of the conditional covariance matrix. 
Property (ii) follows by noting that Ila\bc.d = is equivalent to the vanishing 
of both T.ab\d and and thus also of Ila\b.d = ^ab\d^bb\d- Properties (iii) 

and (v) are direct consequences of (A. 5), while (iv) follows with equation 
(A. 6). Finally, (vi) is another consequence of the definition (A. 4) and the 
equality {Ila\b)a,p = '[^a\p.c- 

The proof is completed by the equivalence of linear and probabilistic in- 
dependence statements in Gaussian distributions. □ 
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